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Abstract 


We  discuss  a fast  algorithm  for  the  linear  programming  relaxation  of 

the  Multiple  Choice  Knapsack  Problem.  Let  N be  the  total  number  of  variables 

in  this  problem  and  let  J and  J denote  the  total  number  of  multiple  choice 

max 

variables  and  the  cardinality  of  the  largest  multiple  choice  set,  respectively. 

The  running  time  of  the  algorithm  is  then  bounded  by  0(J  log  J )+  0(N). 

max 

Under  certain  conditions  it  is  possible  to  reduce  this  bound  to  0(N)  steps  on 
the  average.  Possible  further  improvements  are  also  discussed. 


DO 


I.  Introduction 


Consider  the  following  LP/Multiple-Choice  Knapsack  Problem  (LMCK) 


(LMCK) 

Z*  = Minimize 

L c x 
j€N  J J 

(1) 

subject  to 

Z ax 
j€N  J J 

= ao 

(2) 

Z x = 

jV 

i. 

(3) 

0 

(4) 

xj  - 

1 

k 6 K 

j € N 

JU'CN\U  J, 
k€K 


where  the  multiple  choice  sets,  J^,  k 6 K,  are  mutually  disjoint.  Let 

J = U J.  , I = N\j.  We  refer  to  the  variables  of  J as  multiple  choice 
k€K  * 

(or  GUB)  variables  and  to  those  of  l'  as  simple  upper  bounded  ones. 

(LMCK)  is  a special  case  of  the  general  (LP)  problem  with  general- 
ized GUB  or  VUB  constraints  which  has  been  studied  extensively  (see,  for 
instance,  a recent  paper  by  Schrage  [11]).  Its  main  application  is  as 
a relaxation  for  the  integer  multiple  choice  knapsack  problem  [9],  [13], 
which  is  a useful  model  for  various  real  life  problems.  In  addition,  as 
pointed  out  by  Witzgal,  [14],  an  efficient  algorithm  for  (LMCK)  can  be 
used  to  accelerate  the  solution  of  ordinary  LP/GUB  problems  by  the  dual 
simplex  algorithm.  The  reader  may  note  that  several  generalizations  of 
(LMCK)  fall  within  the  scope  of  the  model  presented  here.  For  instance, 
arbitrary  positive  upper  bounds  in  (4),  as  well  as  arbitrary  coefficients 
in  the  multiple  choice  constraints  (2),  can  be  handled  by  normalization.^ 


Negative  coefficients  in  (2)  can  be  handled  by  complementing  the  variable 
in  question  relative  to  an  artificially  set  large  upper  bound.  The  optimal 
solution  of  (LMCK)  must  be  checked  in  such  cases  for  non-boundedness  of 
the  original  problem. 


Thus  the  two  constraint  linear  programming  problem  can  be  viewed  as  a 
special  case  of  (LMCK) . 

(LMCK)  is  equivalent,  but  not  identical,  to  the  problem  treated 
recently  by  Glover  and  Klingman  [6],  which  in  turn  is  a slight  generaliza- 
tion of  the  problems  studied  by  Sinha  and  Zoltners  [13],  and  Witzgal  [14]. 
The  difference  between  the  model  presented  here  and  the  one  of  [6]  is  in 
the  introduction  of  the  individual  upper  bounds  (4).  [13]  and  [ 14]  do  not 
allow  for  any  variables  which  are  outside  of  the  multiple  choice  con- 
straints. (Using  the  notation  introduced  earlier,  the  model  presented 

in  [6]  corresponds  to  the  case  i'  = 0 while  those  of  [13]  and  [ 14]  to 
2 

1=0).  Upper  bounded  variables  can  be  accommodated  by  the  algorithms 
of  [6],  [13]  and  [14]  by  treating  each  such  variable,  together  with  its 
slack,  as  a multiple  choice  constraint.  However,  this  convention  tends 
to  increase  both  N and  K and  with  them  the  computational  effort.  In 
contrast,  the  algorithm  proposed  here  works  in  the  opposite  direction, 
i.e.  it  converts  multiple  choice  variables  into  simple  upper  bounded  ones. 


Thus,  it  takes  full  advantage  of  the  existence  of  variables  in  I. 

The  algorithms  of  [6],  [13],  and  [14],  as  well  as  the  one  presented 
here,  are  basically  two-phase  procedures.  The  role  of  Phase  I is  to 

identify  the  lower  convex  boundary  of  each  of  the  multiple-choice  sets 

•c«  *s.- 

(see  Propositions  1 and  2 below).  It  is  well  known  that  this  task  can  be 

accomplished  in  u(J  log  J ) steps,  where  J denotes  the  cardinality 

max  max 

A I 

of  the  largest  multiple  choice  set. 

2 _ 

*The  distinction  between  I and  I 'is  of  little  consequence  from  a compu- 
tational point  of  view,  since,  as  pointedout  by  Glover  and  Klingman  [6  ], 
all  but  at  most  2 variables  of  l\l'  can  be  trivially  eliminated  at  the 
outset. 
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Throughout  this  paper  we  let  the  same  symbol  stand  for  a set  and  for  its 
cardinality. 


Phase  II  of  the  above  mentioned  algorithms  is  an  iterative  improve- 
ment procedure.  As  mentioned  earlier,  the  algorithms  of  [6],  [13],  and 
[14]  treat  the  variables  of  I as  additional  multiple  choice  constraints. 

Let  K*  = K + I be  their  total  number  under  this  convention.  The  com-  . 
plexity  of  Witzgal's  Phase  II  is  then  0(1^  (N-K))  while  Glover  and  Kling- 
man's  is  0(N  log  K*).  Sinha  and  Zoltners  do  not  give  complexity  estimates 
but  their  procedure  is  of  a similar  type.  Its  precise  performance  may 
equal  one  or  the  other  of  these  bounds  depending  on  some  unspecified  de- 
tails of  implementation. 

A particularly  interesting  special  case  of  (LMCK)  is  the  knapsack 
problem  (LKP) , which  corresponds  to  the  case  J = 0.  An  0(N)  algorithm  for 
this  problem  is  given  in  [l],[2],[8].  Although  any  of  the  algorithms  of  [6], 
[13],  and  [ 14]  can  be  specialized  to  solve  (LKP),  the  resulting  procedure 
is  not  an  0(N)  algorithm  for  (LKP).  This  discrepancy  brings  to  mind  the 
question,  posed  by  Glover  and  Klingman,  [ 6 1 , as  to  the  possibility  of  design- 
ing an  algorithm  for  (LMCK)  which  retains  (or  even  exceeds)  the  efficiency  achieved 
by  the  algorithms  [6],  [13]  or  [14],  while  specializing  to  an  0(N)  procedure 
when  the  instance  of  (LMCK)  corresponds  to  (LKP).  In  this  paper  we  settle 
this  question  in  the  affirmative.  More  specifically,  our  algorithms  use 
the  calculations  of  Phase  I in  order  to  convert  (LMCK)  into  (LKP).  This 
conversion  is  done  without  increasing  any  of  the  problem  parameters  (such 
as  the  number  of  variables  or  the  size  of  the  coefficients)  and  takes  a 
negligible  computational  effort.  Since  (LKP)  is  known  to  be  of  complexity 
0(N),  the  over  all  complexity  of  our  algorithm  is 


< 
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(5)  P = 0(Jlog  Jlfiax)  + 0(N) 

Expressed  in  terms  of  N alone,  it  is  quite  clear  that  there  exist  positive  con- 
stants, c^,  c^,  such  that 

c N < P < c2  N iog  N 

For  instance,  P ~ 0(N)  if  there  exist  c.  such  that  J < c,,  or  if  for  every 

3 max  — 3 

positive  constant,  c^,  there  exists  N*  such  that  c^*  N > J for  N > N*. 

We  present  the  transformation  of  (LMCK)  into  a knapsack  problem  in  section  II. 

In  section  III  we  discuss  two  possibilities  for  improvements  on  the  bound  (5). 

The  first  one  is  the  "Divide  and  Conquer"  algorithm  of  Bentley  and  Shamos,  [3], 
which  can  be  used  to  speed  up  the  computation  of  phase  I.  Under  certain  condi- 
tions this  approach  yields  an  algorithm  for  (LMCK)  whose  expected  running  time  is 

r 

0(N).  The  second  improvement,  directed  at  phase  II,  is  based  on  a generalization 
of  ideas  due  to  Jeff erson, Shamos  and  Tarjan,  [12],  Johnson  and  Mizogouchi,  [8] 

I 

and  Galil  and  Megiddo,  [5].  It  is  particularly  relevant  for  cases  in  which  Phase  I 
can  be  avoided  entirely.  This  may  arise,  for  instance,  if  each  of  the  sets  J^, 
k€K,  arises  from  piecewise  linearization  of  a certain  (one  dimensional)  convex 
function.  In  such  cases,  it  is  sometimes  possible  to  solve  (LMCK)  in  sublinear 
(worst  case)  effort. 

II.  The  Transformation 

Several  properties  of  an  optimal  solution  for  (LMCK)  are  discussed  in  [5], 

[ 13]  and  [14].  Propositions  1-3  below  are  straight  forward  generalizations 
of  these  properties. 


i 
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Proposition  1 [ 6],  [l3],  [14]  Let  i,j  € for  some  l € K 

with  = a^ 

c.  > c 

i = J 

then  there  exists  an  optimal  solution  to  (LMCK)  with  x^  = 0. 

Proposition  2 [6],  [13],  [14]  Let  i,j,k  € for  some  l € K 

with  a^  < Sj  < ak 


• ai 


Ck  ' C< 


3k  - 3j 


then  there  exists  an  optimal  solution  for  (LMCK)  with  x^  = 0. 

It  will  be  convenient  to  think  on  a given  multiple  choice  set, 

as  a set  of  points,  {(aj,Cj)}  j 6 J^,  in  a two  dimensional  space.  Using 

Proposition  2 we  can  eliminate  from  such  a set  all  but  those  variables 

which  define  its  lower  convex  boundary.  Let  Jk  c be  the  set  of 

remaining  variables  and  let  j'  = U j',  n'  = I U j'.  Proposition  1 

k€K  k 

ensures  that  a.  # a.  i,i  € j/,  i^j. 

i J k’ 

There  are  several  available  algorithms  which  can  be  used  to  identify 
the  sets  J^,  k € K (e.g.,  [7],  [10].)  The  approach  taken  in  [6],  [13]  and 
[ 14]  is  based  on  first  sorting  the  variables  of  each  set  according  to 
increasing  a^  values.  The  sorted  sets  are  then  scanned  and  variables 
which  violate  Propositions  1 or  2 are  purged.  The  computational  complexity 
of  this  procedure,  as  is  the  case  for  most  other  techniques  which  identify 
the  convex  hulls  of  a set  of  points  in  a plane,  is  determined  by  the  sort- 
ing phase.  Since  sorting  a set  of  size  requires  0(Jk  log  J^)  operations, 

the  overall  complexity  of  phase  I is  given  by 

k 

°(Z  J log  Jk)  < 0(J  log  ^ax). 
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Denote  by  (LMCK)7  the  (equivalent)  problem  obtained  from  (LMCK)  by 
replacing  the  sets  JR  with  j£,  k € K and  N with  N7 . We  assume  that  the 
multiple  choice  sets  are  indexed  such  that 

’ * ’ ’ *^k  + ~ *"k^ 

and 

a < a,  .<...<  a 

Jk  jk+1  ik 

W.L.O.G.  we  can  assume  > 2,  k € K.  By  convexity, 

<Cj+1‘Cj)/(aj+l’aJ)  < <Cj+2-Cj+l)/(aj+2"aj+l)  f°r  j“jk’*,,’ik  ’ k€K 

Proposition  3 [6],  [13],  [ 14] 

Any  basic  optimal  solution  for  (LMCK)7,  x,  satisfies 

(6)  x has  at  most  two  fractional  variables. 

(7)  if  x has  two  fractional  variables  they  must  be  adjacent 
variables  within  one  of  the  multiple  choice  constraints. 

(8)  if  x has  a unique  fractional  variable  it  must  be  a non- 
multiple choice  variable. 

We  now  define  a knapsack  problem,  (LKP) , which  is  equivalent  to  (LMCK)  ' 
For  each  multiple  choice  set,  J^,  let 

J*  - J?\(j  ] and  let  J/7  = U J77.  N77  * I U / . Define 

K.  K K »_  /-T,  K 


CJ  ‘ cj-i 


aj  ' aj-i 


if  j € J77 
if  j 6 I 

if  j € y7 

if  j € I 


and  consider  the  following  knapsack  problem: 

(LKP)  W*  ■ minimize  £ d y + £ c 

iPm " J J 1.  Jif 


(LKP) 

W*  ■ minimize 

J6N77  j J 

+ £ 

k€K 

(ID 

subject  to 

JSN7  J j 

+ £ 

k€K 

V « 


7 


i 


k.  ii 


(12) 

yj  > o 

j 6 N 

(13) 

yj  - 1 

J u'  u / 

In  words,  we  have  eliminated  the  first  variable,  x*  , from  each  of 

Jk 


the  multiple  choice  constraints  by  setting  this  variable  to  1.  The  objec- 
tive and  constraint  rows  have  been  re-adjusted  to  record  this  elimination. 
The  remaining  multiple  choice  variables  are  then  replaced  by  "difference 
variables",  yj  = Xj  - Xj_^,  whose  role  is  to  enable  one  to  shift  from  xj^ 
to  other  variables  of  J*  as  possible  representatives  of  this  set.  It  is 
quite  apparent  that  the  variables  y^  will  function  properly  only  under 
certain  conditions.  Indeed,  there  is  no  obvious  correspondence  between  the 
feasible  solution  set  of  (LKP)  and  that  of  (LMCK)*.  However, 


Theorem  1 


(LKP)  is  equivalent  to  (LMCK)7  in  the  following  sense: 

(i)  W*  = Z* 

(ii)  Let  y be  a basic  optimal  solution  for  (LKP)  and  let 

f denote  the  index  of  its  basic  variable,  0 < y^  < 1. 


Then  a basic  optimal  solution  for  (LMCK)',  x,  can  be 
defined  as  follows: 


(14) 

(15) 


<j  “ yj 


J € I 


Let  be  such  that  f ( j'  . 


Define 


if  yro  Vj  6 


max{j[j  € j',  y^ 


1}  otherwise 


then 


vf‘ 


1 € Jk.  j - hk 


0 


J € Jk»  ^ * hk 
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Let  J7  be  the  unique  set  such  that  f 6 J*  (if  indeed 


such  a set  exists).  Set 


i - f 


j = j-i 


j € J',  j / f,  f-1 


Proof.  Call  a basic  feasible  solution  to  (LMCK)7  , x,  acceptable  if  it  satis- 
fied (6),  (7)  and  (8).  Call  a basic  feasible  solution  to  (LKP),  y,  acceptable 
if  it  satisfies  the  following  condition: 

(17)  y > 0,  j € j"  =>  y±  = 1 Vi  € j",  i < j 

The  theorem  follows  from  the  following  three  facts: 

(i)  The  transformation  defined  by  (14),  (15)  and  (16)  is 
a one  to  one  mapping  from  acceptable  solutions  of 
(LMCK)'  onto  those  of  (LKP). 

(ii)  For  any  acceptable  solution  for  (LMCK7),  the  transformation  of 
(i)  preserves  the  objective  function  value. 

(iii)  The  optimal  basic  solutions  to  both  (LKP)  and  (LMCK) ' are 


acceptable  for  the  corresponding  problems. 


III.  Improvements 


Q.E.D. 


Let  us  reconsider  the  bound  (5).  As  we  have  already  noted,  the  overall 
complexity  of  the  algorithm  presented  in  the  previous  section  may  get  as  high 
as  0(N  log  N) . The  bulk  of  this  effort  is  spent  on  the  execution  of  phase  I, 
i.e.,  on  the  identification  of  the  sets  J^,  k ( K.  It  is  well  known  (e.g.,  [3]) 
that  a lower  bound  on  the  effort  involved  with  this  task  is  given  by 


9 


I ! 


■T 


L il 


E U(J  log  J ).  Thus,  any  algorithm  which  is  based  on  phase  I (or  its 
i-1 


equivalent)  will  require  at  least  0(N  log  N)  steps  under  adverse  conditions. 
The  question  of  whether  (LMCK)  can  be  solved  without  explicitly  identifying 


the  sets  J^,  k€K,  is  open. 


The  foregoing  discussion  relates  to  worst  case  analysis  only.  Under 
certain  conditions  one  may  do  better  on  the  average.  For  instance,  Bentley 
and  Shamos,  [3],  have  developed  a "Divide  and  Conquer"  algorithm  which  is 
particularly  efficient  if  « J^,  k € K.  More  precisely,  let  c be 


a random  subset  of  and  let  stand  in  the  same  relation  to  as 


does  to  J . Since  L is  a random  set  so  is  l'.  Denote  by  E(L*)  the  expected 
K K K K 


size  of  this  set.  If  there  exists  a constant  p < 1 such  that 
(18) 


E(L^)  < l£,  k € K 


Then  Bentley  and  Shamos'  algorithm  finds  in  expected  time  which  is 


bounded  by  O(J^).  This  implies,  of  course,  that  phase  I,  and  hence  the  al- 


gorithm as  a whole,  can  be  solved  in  (expected)  0(N)  steps.  (It  is  noted  in 
[3],  that  a "Divide  and  Conquer"  approach  can  yield  an  algorithm  for  the  two 
variable  linear  programming  problem  whose  expected  and  worst  case  running 
time  are  bounded  by  0(N)  and  0(N  log  N)  respectively.  As  noted  earlier,  the 
dual  of  this  problem,  i.e.,  the  two  constraint  linear  programming  problem  is 
a special  case  of  (LMCK).) 

Condition  (18)  is  known  to  hold  under  quite  a variety  of  situations. 


For  instance,  it  suffices  to  assume  that  the  data  f(c.,a,)},  j € J are  inde- 

J J ^ 


pendent  and  identically  distributed  and  that  for  any  given  variable  the  cost 
and  weight  coefficients  are  independent  of  each  other.  For  details  and 
references  see  [3]. 

A final  remark  concerns  the  case  where  each  of  the  sets  Jk,  k 6 K, 


J 


arises  from  a process  of  piecewise  linearization  of  a certain  (one  dimen- 
sional) function.  Such  a process  often  yields  the  coefficients  , j € J^, 

already  sorted  in  a natural  way.  This  reduces  the  computational  complexity  of 
phase  I to  0(J),  and  of  (LMCK)  to  0(N). 

If  the  nonlinear  functions  referred  to  in  the  previous  paragraph  are 
known  to  be  convex,  one  can  sometimes  do  even  better.  We  note  that  in  such 
cases  = J^,k  ^ K,  and  one  can  start  the  computation  directly  at  phase  II 
(as  in  [4],  p.p.  484-486).  The  knapsack  problem  that  results  in  such  a case 
has  a special  structure  that  can  be  exploited  by  generalizing  some  ideas 
brought  forward  by  Jefferson,  Shamos  and  Tarjan,  [12],  Johnson  and  Mizoguchi, 
[8]  and  Galil  and  Megiddo  [5].  The  resulting  algorithm  is  of  complexity 

0(K  log2(j/K))  + 0(1  log  (N/l) ) 

which  may  be  less  than  linear  in  N.  This  would  mean  that  such  problems 
can  be  solved  in  less  time  than  is  needed  to  read  in  the  totality  of  the 
data  ((c^.a^)},  i € N.  Such  sublineal  befavior  is  possible,  for  instance, 
in  an  on-line  environment  where  the  convex  functions  which  are  subject  to  the 


process  of  piecewise  linearization  are  only  evaluated  when  needed. 

The  algorithm  of  [l],  [2],  [8],  for  (LKP)  is  based  on  an  iterative  step 
in  which  the  median  element  of  the  set  [d^/e^],  i € N,  is  used  to  reduce  the 
size  of  N by  a factor  of  at  least  1/2.  The  lion's  share  of  the  computational 
effort  is  spent  on  the  operation  of  identifying  the  median.  The  procedure 
below  uses  a certain  approximation  for  the  median  which  is  cheaper  to  calcu- 
late but  which  still  guarantees  that  a significant  portion  of  N (at  least 
1/4)  will  be  disposed  of  at  each  iteration.  For  a statement  of  the  algorithm 
it  is  convenient  to  rename  the  set  I as  J^.  At  each  iteration,  for  k = 0,...,K  let 
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J*  = {j  € J*| y j was  already  set  to  l] 

= (j  € J^Iyj  was  already  set  to  0] 

J£  * Cj  ^ j"|yj  is  as  yet  unassigned] 


and  for  # = +,  -,  or  * let 

# K # 

N = U J, 


For  a given  scalar  X consider  the  following  partition  of  J^,  k = 0,...,K. 


Jk(X)  = fJ  6 Jk'  dj/ej  > ^ 

Jk(X)  = €‘  Jk  1 dj/ej  = X} 

- fj  € J*  ! dj/ej  < X] 


For  i = 1,2,3  let 


Ni(X)  = U J*(X) 
k=0 


Si(X)  = 2 e. 

JQT(X)  J 


Algorithm  KNAPSACK 

0.  Set  J+  - J"  = 0,  J*  = J,',  k = 0,...,K. 
k k k k 


1.  Choose  X as  follows: 

(a)  Let  r^  be  the  median  index  of  each  set  = [dj/ej],  j € J^, 
k = 0, . . . ,K. 

(b)  Let  r be  the  weighted  median  index  of  the  set 

R ■ fd  /e  ] k = 0,...,K,  where  the  weight  associated 
rk  rk 

•ft 

with  the  kth  element  is  the  cardinality  of  the  set  J^. 

(c)  Let  X = d /e 

r r 
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2.  Calculate  S^(X)  and  S2(X), 


(a)  If  S*(\)  < a'Q  < S^X)  + S2(\)  stop,  X Is  optimal.  An  opti- 
mal solution,  y,  can  be  found  by  setting  y^  = 1 , j £ N+  U N^(X), 


— - 3 

y^  = 0,  j € N UN  (X),  and  then  "filling  the  knapsack"  with 
2 

variables  y^ , j 6 N (X)  (any,  possibly  including  one  at  a 
fractional  value). 

(b)  If  Sj(X)  > a'Q  set  j'k  = j'k  U J*  U Jjj,  J*  = jJ,  k = 0,...,K. 

(c)  If  Sl(X)  + S2(X)  < ag  set  Jj  - J+  U J2  U J*.  J*  - jJ. 
k = 0, . . . ,K,  a'Q  = a'Q  - (S^X)  + S2(X)). 

3.  If  N*  > I+K  Go  to  1. 

Otherwise 

4.  Solve  the  remaining  knapsack  problem  using  the  linear  time  algorithm 
of  [ 1 ],  [2],  [8]. 

To  assess  the  computational  complexity  of  algorithm  KNAPSACK  we  note 
that  each  iteration  reduces  the  size  of  N*  by  a factor  of  at  least  1/4. 

The  number  of  iterations  through  steps  1,  2 and  3 is  then  bounded  by 
0(log(N/(K+I))). 


The  effort  involved  in  each  iteration  is  as  follows: 

(i)  For  each  of  the  sets  J*,  k * 1,...,K,  one  can  find  r^, 

★ 1 2 

calculate  the  contribution  of  to  S (X),  S (X),  and 
form  the  sets  j£,  i » 1,2,3,  in  an  effort  bounded  by 
0(log  J*)  < 0(log  Jk). 

(ii)  The  same  tasks  can  be  accomplished  for  in  0(Jq)  < 0(1) 
steps. 

(iii)  The  weighted  median  r can  be  identified  in  0(K)  steps. 

In  addition  one  may  spend  0(k  + I)  operations  going  through  step  4. 

Thus,  the  overall  complexity  of  algorithm  KNAPSACK  is  bounded  by 


0[log(N/(K+I))(K  log(j/K)  + I)]  < 0 (K  log2(J/K))  + 0 (I  log  (N/D). 
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